Author : Indumathi Pandiyan

Computer Vision Project (Module 2) submitted for PGP-AIML Great Learning on 29-May-2022

PART A - 20 Marks

DOMAIN: Entertainment
CONTEXT:Company X owns a movie application and repository which caters movie streaming to millions of users who on subscription basis. Company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on the movie and clicks on cast information button, the app will show details of the actor in the scene. Company has an in-house computer vision and multimedia experts who need to detect faces from screen shots from the movie scene.
The data labelling is already done.

DATA DESCRIPTION:The dataset comprises of images and its mask for corresponding human face.
• PROJECT OBJECTIVE: : To build a face detection system

Steps and tasks: [ Total Score: 20 Marks]

1.Import and Understand the data [7 Marks]

Import required libraries

A. Import and read ‘images.npy. [1 Mark]

Checking the images in each items of the array to understand the Data

Observation: The image array has 409 images and the array is 2 dimensional

Understanding and checking how the face mask co-ordinates are stored in the given numpy array

Observation: The first Dimension has the image details in array and second dimension has meta data about face mask information

For understanding the data printing the random images

B. Split the data into Features(X) & labels(Y). Unify shape of all the images. [3 Marks]

Set image dimension

C. Split the data into train and test[400:9]. [1 Marks]

D. Select random image from the train data and display original image and masked image. [2 Marks]

Original Image

Masked Image

2.Model building [11 Marks]

Hint: 1. Use MobileNet architecture for initial pre-trained non-trainable layers.
Hint: 2. Add appropriate Upsampling layers to imitate U-net architecture.

A.Design a face mask detection model. [4 Marks]

So the overall architecture of the Mobilenet is as follows, having 30 layers with

B. Design your own Dice Coefficient and Loss function. [2 Marks]

Compile the model

C. Train and tune the model as required. [3 Marks]

D. Evaluate and share insights on performance of the model. [2 Marks]

Dice co-efficient

Dice coefficient is defined as follows:

Dice coefficient = 2 * |X intersection Y| / |X|+|y|

X is the predicted set of pixels and Y is the ground truth.

A higher dice coefficient is better. A dice coefficient of 1 can be achieved when there is perfect overlap between X and Y. Since the denominator is constant, the only way to maximize this metric is to increase overlap between X and Y

Insight:

* The dice coefficient of Test data the model is **0.5766** where as training data its **0.97** Which is almost near 1. <br>
*  The graph of Dice coeffient of Training and validation shows clear overfitting. Its performing better in Training data rather than test data<br>
*  The dice-coefficient and Loss are almost constant over epochs its not showing much change for change in epochs <br>

3. Test the model predictions on the test image: ‘image with index 3 in the test data’ and visualise the predicted masks on the faces in the image. [2 Marks]

Predict the entire test data which are 9 in number

To access the image in index 3

Predicted Mask

Conclusion

The objective of the Project is to find the face detection model. The given data set has the numpy array of images having the face images and its mask coordinates of the Faces. The model is built with Mobile Net for pretraining and appropriate upsamping is been added to imitate the U-Net architecture.

PART B - 10 Marks

DOMAIN: Entertainment

CONTEXT:Company X owns a movie application and repository which caters movie streaming to millions of users who on subscription basis. Company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on the movie and clicks on cast information button, the app will show details of the actor in the scene. Company has an in-house computer vision and multimedia experts who need to detect faces from screen shots from the movie scene.
The data labelling is already done.

DATA DESCRIPTION:The dataset comprises of face images

• PROJECT OBJECTIVE: : To create an image dataset to be used by AI team build an image classifier data. Profile images of people are given.

Steps and tasks: [ Total Score: 10 Marks]

1. Read/import images from folder ‘training_images’. [2 Marks]

Observation : There are 1091 images found in the given directory

2. Write a loop which will iterate through all the images in the ‘training_images’ folder and detect the faces present on all the images. [3 Marks]

Hint: You can use ’haarcascade_frontalface_default.xml’ from internet to detect faces which is available open source.

Haar Cascade Classifiers : Haar Cascade classifier is an effective object detection approach which was proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001.

This is basically a machine learning based approach where a cascade function is trained from a lot of images both positive and negative. Based on the training it is then used to detect the objects in the other images.

So how this works is they are huge individual .xml files with a lot of feature sets and each xml corresponds to a very specific type of use case. [Reference link given below]

For the given problem the "haarcascade_frontalface_default.xml" is used for detecting the frontal face images.

Notes: haarcascade_frontalface_default.xml file is downloaded from github and stored into local location. That file is stored to local system as face_cascade

Understanding the images by viewing a random image

Converting the data to gray Scale for using the Haarcascade

The images are usually BGR (Blue, Green and Red channel). Its computationally more intensive. Converting to Gray scale as it has one channel, black and white

viewing the Grayed Image

Observations:

The above code displays the image with Rectangle detecting the face in it.

Sample image which the classifier could not detect the face in it

Loop for detecting faces in all the images

Trail to check first 3 images

Looping through entire data set and detect the faces

Comments : The above code would loop through the entire data set and detect the faces

3. From the same loop above, extract metadata of the faces and write into a DataFrame. [3 Marks]

viewing the images for which face are not detected

4. Save the output Dataframe in .csv format. [2 Marks]

The Data frame successsfully saved into csv file which can be further used by AI team to build classifier.

Comments: The face detected co-ordinates are saved into the csv file

Conclusion:

References: https://towardsdatascience.com/computer-vision-detecting-objects-using-haar-cascade-classifier-4585472829a9

PART C - 30 Marks

1.Unzip, read and Load data(‘PINS.zip’) into session. [2 Marks]

2. Write function to create metadata of the image. [4 Marks]

Function to identify the metadata

This function accepts the base path, the identity name/ directory name and the individual file name as input and returns the complete path

3. Write a loop to iterate through each and every image and create metadata for all the images. [4 Marks]

Function to Load the metadata from the given path

Note: This method loops through all the folders in the zipped location and creates the individual paths of the images in an array

Location of Unzipped file

Comments: The number of Directories in extracted zip file is 100

Comments: This data set contains totally 10770 records of data of all 100 celebrities and meta data is created for the dataset.

Comments: This data set contains totally 10770 records of data of all 100 celebrities and meta data is created for the dataset.

4. Generate Embeddings vectors on the each face in the dataset. [4 Marks]

Hint: Use ‘vgg_face_weights.h5

5. Build distance metrics for identifying the distance between two similar and dissimilar images. [4 Marks]

calculate distance between two pairs of Images

Show images and find distance between pairs of Images

Distance between Similar images

Distance between Dissimilar images

Comments

Distance between the similar images are less and when dissimilar images its high

Create Test and train data

Encode labels

6. Use PCA for dimensionality reduction. [2 Marks]

Dimensionality Reduction with PCA

7. Build an SVM classifier in order to map each image to its right person. [4 Marks]

Model building and validation

8. Import and display the the test images. [2 Marks]

Hint: ‘Benedict Cumberbatch9.jpg’ and ‘Dwayne Johnson4.jpg’ are the test images

9. Use the trained SVM model to predict the face on both test images. [4 Marks]

Function to create the embedding

Display image with prediction

Conclusion: